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Every high-rise building must meet construction requirements, i.e. it must 
have good safety to prevent unexpected events such as fire incident. To 
avoid the occurrence of a bigger fire, surveillance using closed circuit 
television (CCTV) videos is necessary. However, it is impossible for 
security forces to monitor for a full day. One of the methods that can be used 
to help security forces is deep learning method. In this study, we use two 
deep learning methods to detect fire hotspots, i.e. you only look once 
(YOLO) method and faster region-based convolutional neural network 
(faster R-CNN) method. The first stage, we collected 100 image data 
(70 training data and 30 test data). The next stage is model training which 
aims to make the model can recognize fire. Later, we calculate precision, 
recall, accuracy, and F1 score to measure performance of model. If the F1 
score is close to 1, then the balance is optimal. In our experiment results, we 


YOLO method found that YOLO has a precision is 100%, recall is 54.54%, accuracy is 
66.67%, and F1 score is 0.70583667. While faster R-CNN has a precision 
is 87.5%, recall is 95.45%, accuracy is 86.67%, and F1 score is 0.913022. 
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1. INTRODUCTION 

An area with a very large population causes building to be built vertically, i.e. high-rise building due 
to decrease empty land [1]. However, a high-rise building has been problems related to safe evacuate of 
occupant during emergencies such as fire. Fire is an unexpected disaster and must be handled quickly so that 
it does not spread. If not handled quickly, the fire will cross from one floor to another, making the evacuation 
process more difficult. Therefore, a fire disaster has become a very serious problem that must be handled 
quickly and in a timely manner to avoid loss of life and loss of property [2]. In the construction of a high-rise 
building, each building must meet technical requirements regarding the readiness of a building in the face of 
a fire disaster, be it infrastructure or facilities. One example that must be prepared is a fire detection system 
such as a sensor that can be used as fire protection which can provide an early warning of a fire in the 
building, so that the fire to be resolved quickly. However, this system has a weakness, such as when the fire 
gets bigger it will damage the sensors installed in the building [3]. Currently, several studies have been 
developed fire detection system using computer vision to overcome the weakness of the fire alarm sensor. 
Technology of computer vision can be used to monitor fires remotely using closed circuit television (CCTV) 
videos. However, it is impossible for security personnel to monitor CCTV videos for a full day. Therefore, in 
this study, we use artificial intelligence deep learning to find out if there are fire hotspots recorded on CCTV 
videos. 
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Deep learning is a subset of machine learning that has a concept similar to how the human brain 
works, therefore it is also called an artificial neural network [4]. Currently, deep learning is widely used for 
research, i.e. making decisions, speech recognition, and object detection. One of the methods used for fire 
hotspots detection is the you only look once (YOLO) method which is a modification of the convolutional 
neural network (CNN). This method uses a single neural network to analyze objects in the frame. YOLO uses 
a single neural network for localization of an object in the frame and classification [5]. The network 
contained in this method is 24 convolutional layers [6]-[8]. Previous research was conducted by Lestari et al. 
[7] to detect fire hotspots using the YOLO method which 45 data on fire object images were used divided by 
30 training dataset and 15 testing dataset. Based on the results of the study, an accuracy rate of 90% was 
obtained. However, the image data used is not much and still uses central processing unit (CPU). Therefore, 
in our study the data used is reproduced and made more diverse and graphics processing unit (GPU) used. In 
this study, we also compare the YOLO method with the faster region-based convolutional neural network 
(faster R-CNN) method. Comparison of the two methods is done by evaluating the performance of the 
method, i.e. measuring the level of precision, recall, accuracy, and F1 score. Faster R-CNN method is a 
development of the fast region convolutional neural network (fast R-CNN) [9]. This method has an 
architecture consisting of 2 parts. The first part, region proposal network is used to decide the location to 
reduce computation from the whole inference process so that it can scan quickly and efficiently at each 
location [10]. The second part is Fast R-CNN which is used to sort proposals. Faster R-CNN has 9 anchors 
consisting of 3 scales and 3 ratios that make this method can detect objects more accurately [11]—[13]. When 
we use R-CNN, the bounding boxes (BBs) are generated [14]. 

Several studies have been performed in fire detection, i.e. detect smoke using synthetic smoke 
images. In this study, a synthesis pipe is built and simulates using a variety of smoke conditions. The data 
used are categorized into two, i.e. smoke and not smoke. In the test, not smoke category has a strong 
interference in detecting smoke [15]. Other research was also conducted by Appana et al. [16] to detect a 
smoke on video using the pattern of smoke flow in the alarm system. In this study, he used three attributes in 
building a smoke detection system, i.e. color, blur, and diffusion behavior. The first stage is analyCze color, 
then extract the features using the Gabor filtering method to get a feature vector. The final stage of this 
research is to classify the types of smoke by using a support vector machine (SVM) [16]. Further research 
was conducted by Hendri [17] on forest fire detection using the CNN method. This method uses 
reclassification to detect hotspots. To detect an object, the previous system would take an object's classifier 
and evaluate it at various locations and various scales in the frame. In his research, detection of fire object 
using CNN method has an accuracy about 54%. The next study was carried out by Mohammed et al. [18] to 
detect forest fires using machine learning methods such as SVM and k-nearest neighbors (KNN) on geodata. 
In this research, it was obtained accuracy rate of the SVM model was 74% and the KNN was 58%. 
Furthermore, the research was conducted by Kadir et al. [19] used a wireless sensor network (WSN) to detect 
forest fires. WSN technology is used in sensor systems to collect environmental data. Hotspot detection 
training data is conducted in the data center to determine and infer fire hotspots that have the potential to 
become major fire hotspots. However, if a large fire occurs it can damage the sensor device. Based on this 
description, previous researchers detected fire hotspots using conventional methods, i.e. SVM, KNN, and 
CNN. Therefore, in this study, fire hotspots were detected using the latest methods such as the YOLO 
method and the faster R-CNN method. 

Other studies have also been performed in detecting fire, i.e. Li and Zhao [20] used the SSD 
method, Gagliardi et al. [21] used the Kalman filter and CNN algorithm, Saponara et al. [22] used the 
YOLOv?2 method, Park and Ko [23] used the YOLOv3 method, and Zhong et al. [24] used the CNN method. 
However, previous researchers only perform detection of a fire outdoors i.e. surrounding environment and 
forest fires and have not detected indoors such as in buildings. So in this study, we propose to detect fires 
hotspots that appear in the room using the YOLO method and the faster R-CNN method. 


2. RESEARCH METHOD 

In this study, we want to compare two methods of detecting fire hotspots by using YOLO method 
and the faster R-CNN method. General framework of this research is shown in Figure 1. The first stage in 
this research is collecting data. The data used are image data containing 100 random images of fire objects. 
The data is categorized into two with a composition of 70 training data (70%) and 30 testing data (30%). 
After obtain the training data and testing data, we perform the labeling image on training data. 


2.1. Labeling image 


The next stage in this research is to label the training dataset by creating a bounding box around the 
object to be recognized. The labeling results contain information on the position of the object you want to 
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detect and store in .xml form which shown in Figure 2. After that we perform transfer learning by using the 
YOLO method and the faster R-CNN method. 


Collecting Dataset 


Tiny YOLO Pre Train 
— Model 
? (Network Labeling |: | Faster R-CNN |:| Testing 
: Code Image |: Inception V2 |:| Dataset 


Convert The Data to 
Tensorflow Record Format 


Prediction Model 


Prediction Model 


Figure 1. General framework of this research 


<annotation> 

<folder>images</folder> 

<filename>img6. jpg</fi1lename> 

<path>D:\yolo_code dan manual book\yolo_code dan manual book\darkflow- 

master \train\images\img6. jpg</path> 

<source> 
<database>Unknown</database> 

</source> 

<size> 
<width>640</width> 
<hei ght>480</hei ght> 
<depth>3</depth> 

</size> 

<segmented>0</segmented> 

<object> 
<name>Fire</name> 
<pose>Unspecified</pose> 
<truncated>0</truncated> 
<difficult>0</difficult> 
<bndbox> 


<xmi n>466</xmin> 
<ymin>300</ymin> 
Sinans 493 aman 
ax>315<, ax> 
</bnd bass a 
</object> 
</annotation> 


Figure 2. Example of labeling format in XML 


2.2. Transfer learning with YOLO method 

In the YOLO method to detect objects, the image will be split into a grid with a size of SxS [25]. 
The next stage, we make bounding boxes on these grids and have a confidence value. Confidence value is the 
probability that the object is in the bounding box as in the (1). If the centroid of the fire object is in the grid 
cell, the grid is tasked with detecting the fire object. 


CV = Pr(Object) * JOUR a ct (1) 

IOU is intersection of bounding box predicted by the ground truth divided by the union of bounding 
box predicted by the ground truth. IOU has value from 0 to 1 and bounding box will approach ground truth if 
IOU value close to 1 [26]-[28]. We also define probability of class for each grid in (2): 


Pr(Class;|Object) * Pr(Object) * 1OUsreih-, = Pr(Class;) * lOUpretice (2) 
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In the YOLO method, there are 24 convolution layers with 2 connected layers [29] and has a fast 
version designed to quickly find the boundary of detected objects [6]. One example of a fast version of the 
YOLO method is the tiny YOLO model which has 9 convolutional layers [30] is shown in Table 1. The tiny 
YOLO model contains a network code and pre train weight. 

We use the tiny YOLOv3 model as a pre train model to be used in the transfer learning process. 
Transfer learning is learning carried out by the pre train model to recognize fire objects in training data that 
have been labeled as in Figure 1 (red box). To make transfer learning, batch size and learning rate are needed. 
We use batch size=1, because the data used is an image that has a very large size, so that the image sample 
can pass the training process into the neural network quickly and we use a small learning rate, i.e. 0.0002. 
The smaller the value of the learning rate, the value of the loss function is guaranteed to decrease after the 
update. Furthermore, the model training is carried out repeatedly so that the pre train model can recognize the 
fire object well. In this study, we use loss function to measure performance of model as shown in (3) [7]. The 
model's performance gets better if the loss value is less than 1 or close to 0. 


2 bi P A 2 bi = 2 
Loss = Acora Xo UR hy [Cri Fi)? + Csi 89] + Acoora Eo XRo lg” | (vee VE) + 
2 i mcs j aa 
(Vvi- V9) | + Deo Dro 13° (CV CV) + D Vico Deo le (cV,- CV;)° + 


2 obj 

Acoord dio p ; È ceclasses(pi(c)- fi (0)? (3) 
Where S is the measure of the grid, r and s variables are the centers of each prediction, t and v variables are 
dimensions of bounding box. The Aggorq variable is used to increase probability value of bounding box that 
has a fire object and Anoop; variable is used to decrease probability value of bounding box that has no fire 


object. CV is a confidence value and p;(c) is prediction of class. 

The loss value is used to see the performance of the pre train model (tiny YOLO model) in learning 
to recognize fire objects. After the learning process is complete, a new model from the training results will be 
formed that can recognize object of fire. The new model will be used to predict an image whether it contains 
fire objects or not. In this research, we use python programming to run the YOLO method. 


Table 1. The architecture of tiny YOLO model 
Layer Shape Stride Kernel 
Input (416, 416, 3) 
Conv (416, 416, 16) 1 

MaxPool (208, 208, 16) 2 

Conv (208, 208, 32) 1 

MaxPool (104, 104, 32) 2 

Conv (104, 104, 64) 1 

MaxPool (52, 52, 64) 2 2x2 
Conv (52, 52, 128) 1 3x3 

MaxPool (26, 26, 128) 2 2x2 

1 
2 
1 
1 
1 
1 
1 


3x3 
2x2 
3x3 
2x2 
3x3 


Conv (26, 26, 256) 3x3 
MaxPool (13, 13, 256) 2x2 
Conv (13, 13, 512) 3x3 
MaxPool (13, 13, 512) 2x2 
Conv (13, 13, 1024) 3x3 
Conv (13, 13, 1024) 3x3 
Conv (13, 13, 125) 1x1 


2.3. Transfer learning with faster R-CNN method 

The faster R-CNN method uses the region proposal network (RPN) to increase speed when perform 
objects recognition [31]-[34]. The RPN will receive input in the form of a feature map that has been 
processed by convolution. The convolution process is carried out using an architecture that is on CNN. In this 
research, we used inception V2 architecture. The inception V2 architecture is designed to reduce CNN 
complexity [35]. The inception V2 uses pre train model to transfer learning process which can be seen in 
Figure 1 (blue box). 

To make transfer learning, batch size and learning rate are needed. We use batch size and learning 
rate same as size and learning rate in the YOLO method, i.e. batch size is 1 and learning rate is 0.0002. The 
batch size is a term used in transfer learning. The learning rate is the number of changing to the model during 
each step of this search process [35]. The learning rate can control a model learn a fire detection [36]. After 
that, we use the loss function to determine performance of the model as in (4) [37]: 
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L({pri}, {tri} = aii Laas (pri, pr; ) + Veet pri Lregr (tr; tri) (4) 


where i is the index of anchor, pr; is probability value of anchor, pr; is label of ground truth, i.e. if the 
positive label then prř=1 and if the negative label then pr; =0; pr; is a coordinate of the bounding box of the 


anchor, tr; is the ground truth box, Lejas is log loss, Njas is normalization classifier value with value 256, 
and Nregr is normalization regression value with value 2,400. However, to balance regression and classifier 
can be done by multiplying y [37]. 

If the loss value less than 1 or close to 0, then transfer learning process will finish [7]. This process 
produces a model that can recognize fire hotspots. Based on the description of the YOLO method and the 
faster R-CNN method, the loss value can be used to obtain a good model for detecting fire hotspots. The final 
stage, we use the new model to predict testing data. 


3. RESULTS AND DISCUSSION 

This section describes about results of the YOLO method and the faster R-CNN method in detecting 
fire hotspots. The YOLO method divides image input into grids of SxS size. The pieces of the image will go 
through a convolution process. In the YOLO architecture, there are 24 convolutions, 4 max pooling, and 2 
fully connected layers to get a grid which contain a value that will be used in the classification process. If the 
number of image grids is very large and the convolution process takes a long time, it will cause a very heavy 
computational process. 

Meanwhile, the faster R-CNN method uses the RPN to propose areas (parts of an image that you 
want to observe or predict as objects to be detected). The RPN produces several bounding boxes, each box 
has 2 probability scores whether there are objects at that location or not. The resulting areas will be input in 
the classification process. The use of the RPN can reduce the computational requirements significantly, 
because it does not have to go through the process of dividing the image into grids. In this section, we will 
explain about training dataset, test results on testing dataset using the YOLO method and the faster R-CNN 
method, and evaluation results using some indicators such as precision, recall, accuracy, and F1 score. 


3.1. Training dataset 

In this study, we use 70 training dataset obtained from various sites. The training dataset is a 
collection of images containing fire objects. This data is used by the model to learn about the fire objects that 
contained in the data. Some of the training dataset that used in this study can be seen in Figure 3. 


Figure 3. Sample of training dataset (20 of 70 training dataset) 
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3.2. Testing results of the YOLO model 

After we get the training dataset, the next stage is to label the image by providing a bounding box to 
the object we want to recognize, ie. fire. After that, we do transfer learning using a pre train model, ie. tiny 
YOLOv3 model, so that the model used can study the fire object. The learning process is done continuously 
until the loss value is less than 1 or exceeds the desired step limit, which in this study we used 10,000 steps. 
The loss value of the training model on the YOLO method can be seen in Figure 4. 


Figure 4. The loss value of training model on the YOLO method 


Based on the Figure 4, it can be seen that the loss value in the 10,000 step is still greater than 1. It 
indicates that the YOLO model has poor performance. Furthermore, detection of fire hotspots using the 
testing data is performed. The detection results of fire hotspots using the YOLO method can be seen in 
Figure 5. In Figure 5, there are 12 images with fire object detected in actual condition and declared as fire in 
the application (true positive), no fire was detected in actual condition but declared fire in the application 
(false positive) obtained as many as 0 image, fire detected in actual condition but not stated application (false 
negative) obtained as many as 10 images, and no fire detected in actual condition and not stated application 
(true negative) obtained as many as 8 images. 


3.3. Testing results of the faster R-CNN model 

Next stage, we perform transfer learning using the second method, i.e. the faster R-CNN method. 
The pre train model used is the faster R-CNN Inception V2. The learning process is done continuously until 
the loss value is less than | or exceeds the desired step limit, which in this study we used 10,000 steps. The 
loss value of the training model on faster R-CNN method can be seen in Figure 6. 

In Figure 6, it can be seen that by using the same number of steps, i.e. 10,000 steps, the loss value 
close to 0. It indicates that the faster R-CNN model has a very good performance. Furthermore, detection of 
fire hotspots using the testing data is performed. Detection results of fire hotspots using the faster R-CNN 
method is shown in Figure 7. In Figure 7, it can be seen that there are 21 images with fire hotspots detected in 
actual condition and declared as fire in the application (true positive), no fire was detected in actual condition 
but declared fire in the application (false positive) obtained as many as 3 images, fire was detected in actual 
condition but not stated application (false negative) obtained as many as 1 image, and no fire detected in 
actual condition and not stated application (true negative) obtained as many as 5 images. 
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Figure 5. The prediction results of fire objects using the YOLO method 
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Figure 6. Loss value of training model on the faster R-CNN method 
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Figure 7. Prediction results of fire hotspots using the faster R-CNN method 


3.4. Evaluation results of the model 
In this study, we calculate precision, recall, accuracy, and F1 score to measure performance of the 
YOLO method and the faster R-CNN method. The formula can be seen in (5), (6), (7), and (8) [38]-[40]: 


pauo TP 
Precision = z57yp x 100% (5) 
Recall = —=— x 100% (6) 
FN+TP 
TP+TN 
= 0, 
Accuracy = -yanri x 100% (7) 
F1 score = 2xRecallxPrecision (8) 


Recall+ Precision 


with FN is false negative, TN is true negative, TP is true positive, and FP is false positive. The evaluation 
results of the YOLO method and the faster R-CNN method is shown in Table 2. In Table 2, we can see that 
the YOLO method has a value of precision is 100%, recall is 54.54%, accuracy is 66.67%, and F1 score is 
0.70583667. While the faster R-CNN method has a value of precision is 87.5%, recall is 95.45%, accuracy is 
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86.67%, and F1 score is 0.913022. We can see that precision value of YOLO better than faster R-CNN 
method. But recall, accuracy and F1 score of faster R-CNN better than YOLO method. 


Table 2. Evaluation results of YOLO and faster R-CNN methods 


Indicator YOLO Faster R-CNN 
True Positive 12 21 
False Positive 0 3 

False Negative 10 1 

True Negative 8 5 
Precision 100% 87.5% 
Recall 54.54% 95.45% 
Accuracy 66.67% 86.67% 


YOLO method is very good at detecting the presence of fire hotspots if the image data used is 
uniform (training and testing image are not much different). However, if the image data used is random 
(training and testing image are very different), the YOLO method is not good in detecting the presence of fire 
hotspots. Therefore, if the image data is random, it is suggested to use the faster R-CNN method because it is 
very good in detecting fire hotspots. 


4. CONCLUSION 

In high-rise building, fire object detection is needed to determine whether a room has a fire or not so 
that it can be immediately handled by the fire department. In this study, a comparison of fire detection using 
2 methods was carried out, i.e. the YOLO method and the faster R-CNN method. The data used consisted of 
100 images containing fire objects. We divide data into 70 training data and 30 testing data. Later, we 
perform model training so that the model can learn and recognize fire objects. The next stage is to make 
predictions using testing dataset. From research results, we found the YOLO method has an accuracy rate is 
66.67% and the faster R-CNN method has an accuracy rate is 86.67%. This indicates that the faster R-CNN 
method has better performance than the YOLO method. For further research, trainings with more types of 
backgrounds are also added. 
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